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ABSTRACT 

The bispectrum of the microwave background sky is a possible discriminator between 
inflationary and defect models of structure formation in the Universe. The bispectrum, 
which is the analogue of the temperature 3-point correlation function in harmonic 
space, is zero for most inflationary models, but non-zero for non-gaussian models. The 
expected departures from zero are small, and easily masked by noise, so it is impor- 
tant to be able to estimate the bispectrum coefficients as accurately as possible, and 
to know the errors and correlations between the estimates so they may be used in 
combination as a diagnostic to rule out non-gaussian models. This paper presents a 
method for estimating in an unbiased way the bispectrum from a microwave back- 
ground map in the near-gaussian limit. The method is optimal, in the sense that no 
other method can have smaller error bars, and in addition, the covariances between 
the bispectrum estimates are calculated explicitly. The method deals automatically 
with partial sky coverage and arbitrary noise correlations without modification. A 
preliminary application to the Cosmic Background Explorer 4-year dataset shows no 
evidence for non-gaussian behaviour. 



1 INTRODUCTION 



The question of whether or not the microwave background 
sky is well-approximated by a gaussian random field is 
important for distinguishing inflationary and defect mod- 
els of the early Universe. Inflationary models predict im- 
measurably small non-gaussian components, arising from 
gravity waves (Bharadwaj, Munshi & Souradeep 1997) 
and the Rees-Sciama effect (Mollerach et al. 1995, Munshi, 
Souradeep & Starobinsky 1995) . The difficulty which besets 
such tests is that the predicted departures from gaussian be- 
haviour for most defect models are small (e.g. Luo 1994a), 
and are correspondingly difficult to detect in the presence of 
noise, which may be instrumental or cosmic variance. This 
makes it extremely important to be able to calculate the 
statistical properties of the non-gaussian discriminants. In 
particular, it will probably be necessary to combine the re- 
sults of a large number of estimates, to obtain a statistically 
significant departure from gaussianity (if it exists), or to 
make a convincing case that the sky is indeed gaussian. 

Ironically, the best evidence for an inflationary model 
may well come not from a specific test for non-gaussianity, 
but rather from the power spectrum. For inflationary mod- 
els, the power spectrum of temperature fluctuations is pre- 
dicted to have a reasonable amount of structure in it, with 
multiple acoustic oscillation peaks which should be measur- 
able by future satellites such as MAP and Planck (Jungman 
et al. 1996, Bersanelli et al. 1996). Should such structures be 
found, and found to agree with the inflationary model pre- 
dictions within the errors, the case against any form of defect 
model would be strong, and even the present knowledge of 
the power spectrum appears already to rule out many defect 



models (Pen, Seljak & Turok 1997). In this case, the absence 
of non-gaussian signatures would be a useful confirmation. 
However, if the power spectrum turns out not to be well- 
fitted by inflationary models, the question of the gaussian 
nature or otherwise of the fluctuations becomes correspond- 
ingly more important. There are many ways to approach the 
problem of determining whether fluctuations are gaussian, 
in large-scale structure and in the CMB. These include the 
3-point function (e.g. Hinshaw et al. 1994, Falk, Rangara- 
jan & Srednicki 1993, Luo & Schramm 1993, Gangui et al. 
1994), the genus and Euler-Poincare statistic (Coles 1989, 
Gott et al. 1990, Luo 1994b, Smoot et al. 1994), peak statis- 
tics (Bond & Efstathiou 1987, Kogut et al. 1995, Kogut et al. 
1996) and studies of tensor modes in the CMB (Coulson, 
Crittenden & Turok 1994) . The approach we take here is to 
investigate the bispectrum, for which some studies have been 
made in the large-scale structure literature (Hivon et al. 
1995, Matarrese, Verde & Heavens 1997, Verde, Heavens & 
Matarrese 1998) and in the CMB (Luo 1994a). There may be 
sharper tools for detecting specific non-gaussian models, but 
the rationale for this approach is that the bispectrum offers a 
generic test for non-gaussian models, in the following sense: 
a general field will have non-zero connected n-point func- 
tions at all orders, and the bispectrum is the lowest statistic 
(with n > 1) for which a gaussian field has zero expectation 
value. 

The principal advantages of the approach detailed in 
this paper are that it deals automatically with masked re- 
gions of the sky and correlated noise, and that the estimates 
of the bispectrum coefficients come with error bars and co- 
variances between the errors. This last point is particularly 
important when one realises that a single bispectrum coef- 
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ficient estimate is unlikely to rule out a model, because the 
cosmic variance is often larger than the expected signal, and 
so one is going to need many coefficients in practice. A final 
point is that, for gaussian fluctuations, the estimator below 
cannot be improved, in the sense of having a smaller error 
bar. 



2 METHOD 

The optimisation procedure in this paper is a generalisation 
of the optimal quadratic estimator for the power spectrum, 
presented by Tegmark (1997). For consistency, we follow his 
notation as far as possible. Let Xi be the temperature fluc- 
tuation AT/T, in some sky pixel i. The temperature map is 
expanded in spherical harmonics Y™ in the usual way: 



AT 



(n) ~^2xiYr'{ei,^)Asu 



(i) 



where dQ, Atli represent elements of solid angle, and 8 and 
<f> are polar coordinates. The power spectrum is defined as 



Ci = <k m | 2 > 



(2) 



where the angle brackets indicate an ensemble average. Ex- 
pectation values of products of distinct spherical harmonic 
coefficients are zero by isotropy, independently of whether 
the temperature map is gaussian or not. The bispectrum is 
defined as 



B(£ 1 £ 2 £3,mim2m3) 



(3) 



ft is zero, unless the indices comply with the following tri- 
angle closure constraints (e.g. Edmonds 1957, Luo 1994a): 
m 1 +m 2 +m s = 0; £i+£ 2 + £s =evcn; \£i-£j\ < 4 < ti+lj 
for i,j,k — 1, 2, 3. 

We seek an estimator of B which is lossless, if possible, 
in the sense that it contains as much information as the 
original map {xi}. Ideally it should be unbiased, and with 
calculable statistical properties. In the spirit of Tegmark's 
optimal quadratic estimator for the power spectrum, we seek 
an estimator for the bispectrum which is cubic. We consider 
quantities y a of the following form 



E 

pixels ijk 



ijk -X'i%j%k • 



(4) 



We will find that the y a are related to the bispec- 
trum estimates, but will not be the bispectrum estimates 
themselves. We introduce the shorthand notation a = 
{£i,£2,£3,mi,m2,m,3}, and we also combine the list of data 
triplets into a data vector with elements labelled by A: 



D A 



X%X jXfc 



(5) 



where A represents some triplet k}. The Ef jk are some 
coefficients to be determined. The mean of y a involves the 
3-point function, which may be written in terms of the bis- 
pectra as follows: 



fl A = {XiXjX k } = B a Rijk 



(6) 



where we have assumed that the noise has a zero 3-point 
function. If it is known and non-zero, it may be added. The 



functions connecting the 3-point functions in real and har- 
monic space are (Gangui et al. 1994): 

Rijk = lN(^ njknkl )W ei W e2 Wt 3 

x Pe 4 (cosfi j )Pe 5 (cos'yjk)Pe 6 (cosfki) 

TTi4Tn 5 m 6 



Trm 4 m, 6 m 1 rrm 5 m 4 m2 Trm s m 6 m 3 

x rl e i e 6 e 1 "(5(4(2 11 €5*6*3 



where 



TTm.1m.2m3 _ I 
"(1(2(3 ~ / 



(7) 



(8) 



and can be related to Clebsch-Gordon coefficients. The effect 
of beam-smearing (here modelled by a gaussian) is through 
the window functions 

W e = exp [-£(£+ l)a 2 /2] . (9) 
fij is the angle between pixels i and j, and 



1 - cos - cos 7 jfc - cos 7 ifc + 



2 cos 7ij cos fjk cos 7ife. 



(10) 



2.1 Optimal estimator y a 

We wish to minimise the variance of y (cf Tegmark 1997 for 
power spectrum), which involves the 6-point function. The 
means are 



(Va) — ^2 B a 'RijkE?jk- 



(11) 



a' ijk 



The covariance beween the ys is C aa i = (y a y a '} — {Va){y a ') 
which we obtain from the triplet data covariance matrix: 

(XiXjXkXi/Xj/Xy) - {x l X j X k ){x 

)■ (12) 

We now make an assumption concerning the departures from 
gaussianity. Since these are expected to be small, we approx- 
imate the covariance matrix by the covariance matrix for a 
gaussian field with the same power spectrum. This assumes 
that the bispectrum is small compared with the cosmic vari- 
ance, and also assumes that the connected 4-point function 
is small. Strictly, this method is optimal for testing the hy- 
pothesis that the field is gaussian, but it should be very 
close to optimal for practical cases, since the expectation is 
that the bispectrum will be small. If the assumption is not 
justified, and the bispectrum is not small compared with 
the cosmic variance, detection will not be difficult in any 
event. If this turns out to be the case, it will be important 
to check that the estimator remains unbiased in the case of 
large intrinsic bispectrum. 

In the gaussian approximation, {xiXjX k } — 0, and we 
use Wick's theorem to write 

(XiXjX k Xi'Xj'X k r) = £,ij£,ki'£,j'k' 

+ permutations (15 terms). (13) 

where we have defined the 2-point function of the tempera- 
ture field: 

= (xiXj) = ^^-°^os ~iij)Wt + Nv (14) 
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Pe are Legendre polynomials and Nij is the noise covariance 
matrix. We can then compute the covariance matrix for y a : 



V act > = {y a y a >) = {&j£,ki'£,j'k> + perm.) Ef jh El 



(15) 



and we have from now adopted the summation convention 
for repeated pixel indices, and also, unless stated otherwise, 
a indices. The products of £ terms are of two types: there are 
6 terms like Cii'Cjj'Cfefc'i with one of each pair of subscripts 
from each distinct E, and 9 terms of the form ^ij^ki'^j'k' 
where only one £ mixes dashed and undashed indices. Since 
the Efj h are symmetric to permutations in the {ijk}, we get 



V aa ' — (,u> (&^jj'£,kk' + Qijkij'k') Eij k E"j, k i 



(16) 



We now minimise the variance V aa (not summed) with re- 
spect to E"j k , subject to a normalisation constraint on E to 
ensure it is not driven to zero. We choose 



^ijk^ijk 

giving 



(17) 



(18) 



£,ii> (6£,jj'£,kk> + 9?jfcG' k') E°t ji k , — XR° jk 

where A is a Lagrange multiplier. Multiplying by ^J,}^^, 
and summing over ij, gives 

^kk'Eii> jllk , +9£ J i k i5p, k E'j"„ :j , k , = X^'J^^^Ri 



-1 r>a 
^3" J^t"i I Hjk- 



(19) 



5^ is a Kronecker delta function. Defining r",^ = £.,^R" jk , 
we get, after some relabelling of indices, 

§£,ijEi"jk + ySikS.jmEiiimj = A^ 1 r"/ fcj -. (20) 

Taking the trace of this equation, and inserting it into the 
second term gives the coefficients we require for y a to have 
its minimum error bar: 

3 

33' ^kk' 



1 



"i' 4' k' • 



(21) 



2 + 3N Sjfc ^' 

In this expression, A has been set to unity for convenience, 
and N is the number of pixels in the map. Provided that A 
is finite, its value does not affect the bispectrum information 
content of y a : any multiple of y a contains the same infor- 
mation as y a itself. This makes obvious sense, and can be 
shown formally via the Fisher information matrix (below). 



3 LOSSLESS ESTIMATOR OF THE 
BISPECTRUM 

In order to estimate how well the y a will perform in esti- 
mating the desired parameters B a , we compute the Fisher 
information matrix (Tegmark, Taylor & Heavens 1997) 



F, 



d 2 Inp 



(22) 



dB a B a i 

where p is the posterior probability distribution for the pa- 
rameters (equal to the likelihood, if uniform priors for the 
parameters are assigned) . For a data vector with components 
with means fi and covariance matrix C the Fisher matrix is 



-Tr 



i dC n - 



dC 
dB„, 



+ 2C~ 



d/j, djj, 



dB a 8B n 



(23) 



The error on the parameters B a is contained in this matrix: 
if all other parameters are known, the minimum error is 
the conditional one, as a = l/VF aa - If all parameters are 



to be estimated from the data, then the appropriate error 
is the marginal error This assumes the probability 

surface is adequately approximated by a second-order Taylor 
expansion at the peak. 

As expected for the 'near-gaussian' approximation, the 
covariance matrix for either the triplets XiXjX k or the y a 
does not depend on the parameters to be estimated - the 
Fisher matrix is determined only by the derivatives of the 
mean values. For the triplets, 



P — C~ R Q R a 

c aa' ^ ijki 1 j'k 1 ^ijk^i' j'k' 

— {^i}j'£,kk> + 9£jk£,j> k > 



RijkRi'j'k' (24) 



A similar procedure to the computation of E above gives 
the inverse covariance matrix 



C 



1.1 k i' j 



2(2 + 37V) 



(25) 



This is an important simplification for computational rea- 
sons: without it, the inversion of an N A x N A matrix (C) 
would be impractically slow. Decomposing its inverse into 
N x N matrices £ _1 is much faster. 

Since we can recreate the original temperature map {xi} 
from the triplets, ( |24| ) is also the Fisher information matrix 
for the original, entire map. We now make a comparison 
with the Fisher matrix for the y a - are their errors as small 
as is possible with the entire map? The covariance matrix 
for the y a ( |lrj ) is, for the optimal choice of E coefficients 



3k^j'k' 
-1 
j' 



A ^ijki" 3 n k" "■%" j"k'"^i'j'k'i'"j'"k'" j"'k'' 
= F i 

Since the ensemble average of y a is 

(Va) — B a i Rij k Eij k 

i.e. we obtain the simple result 

{l/a) = B a r F aa ' . 

Consequently, we can use the vector 
B a = F^,y a , 

as an estimator of the bispectrum. It is unbiased: 



(26) 

(27) 
(28) 
(29) 

(30) 



and the bispectrum estimates also have calculable covariance 
properties: 

C aa i = (B a B a i) — (B a ){B a i) 

F a a"E a r a m {y^'Va 1 ") F aa ii F a , a n, {y a ") {y a m) 

= F^l.F - , 1 ,„F „ a, 

± aa" 1 - a'a'" a" a'" 

= F£. (31) 



This also proves that the estimators are optimal, by the 
Fisher-Cramer-Rao inequality. 
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4 APPLICATION TO COBE 4- YEAR DMR 
DATA: 

We illustrate the method by applying the method to the 
COBE DMR 4-year data, focussing on measuring low-order 
coefficients. For this experiment, the width of the approx- 
imately gaussian beam is a — 3.2° (Wright et al. 1992). 
The method is computationally expensive, and is in the 
process of being optimised, but for the moment, the ap- 
proach taken is to average the ~ 4000 unmasked pixels 
of the COBE dataset into larger pixels of roughly 12 de- 
grees square. This introduces an additional effective gaussian 
smoothing for large scale coefficients of 12° /VT2, which is 
added in quadrature to the COBE beam. We shall see the 
effect of this additional pixelisation below, in the form of an 
error bar larger than that of cosmic variance, especially for 
the higher harmonics. The effective beam suppresses con- 
tributions to the bispectrum from harmonics with £ larger 
than £(£+l)a^ s /2 ~ 1, i.e. £ > 17. We therefore truncate the 
summations in the estimator for B, and the Fisher matrix 
(scalar in this case) at the conservative limit of f max = 40. 
Pixel errors are taken from the COBE DMR datasets, and 
assumed to be independent. Averaging is done by inverse- 
variance weighting. The power spectrum is assumed to have 
a Harrison- Zeldovich spectrum, and the normalisation is 
Qrms = 18.4nK (Gorski et al. 1996). Coefficients are cho- 
sen for which non-Gaussian predictions are quoted in (Luo 
1994a) 



5(2 2 2,11 - 2) 
5(444,22 - 4) 
5(666,33 - 6) 
5(888,44 - 8) 
5(101010,5 510) 



(2.5 ±4.1) x 10~ 15 
(5.4 ±6.5) x 10 -15 
(12.4 ±8.9) x 10~ 15 
(1.5 ± 11.6) x 10~ 15 
(-16.3 ± 16.6) x 10~ 15 



(32) 



from which we see that there is no evidence of non- 
gaussianity, at least for these coefficients. Note that for the 
first bispectrum estimate, the cosmic variance corresponds 
to an error of 1.4 x 10~ 15 (e.g. Luo 1994a). These very 
large-scale modes are not ideal for this sort of study - a 
higher-resolution experiment generally has higher signal-to- 
noise (Luo 1994a). A recent preprint (Ferreira, Magueijo & 
Gorski 1998) claims a detection of non-gaussianity at I = 16, 
a mode which is not probed here. 



5 SUMMARY 

In this paper, we see that it is possible to construct an esti- 
mator for the bispectrum which is lossless, in the sense that 
it contains as much information on the bispectrum as the en- 
tire map. It is also unbiased, and the covariance properties 
of the estimators are calculable. The estimator involves one 
approximation - that the departures from gaussianity are 
small. In this limit, there is no other method which will lead 
to smaller error bars. The fact that the covariance prop- 
erties are known is important in practical cases, because 
the bispectrum may well be small in comparison to cosmic 
variance, so a single estimate is unlikely to be sufficient to 
rule out many non-gaussian models. Many estimates (with 
different fife4mim2W3) would be required. As a test for 



non-gaussianity, this method has the advantage that confi- 
dence levels can be computed analytically, without recourse 
to Monte Carlo simulation. 

The notion of an optimal method is defined rather pre- 
cisely in terms of information content and bias, but the is- 
sue of whether a method is good or not is wider than this. 
There is no doubt that the number of computations required 
to do this analysis is very large, dominated for reasonable 
pixel counts by computation of the R coefficients. This can 
be aided by precomputing Clebsch-Gordan coefficients (Ver- 
maak, Vermaak & Miller 1984) and using a packed-storage 
algorithm, and by using parallel computers, for which this 
problem is ideal. However, it is clear that it will not be possi- 
ble to deal directly with the entire dataset from Planck with- 
out some form of pre-compression, perhaps along the lines 
above for the \ow-£ modes. For high-£ modes, subdivision 
of the sky into essentially independent datasets may be re- 
quired. This is an inherent problem for high-order statistics, 
but it may also be required even for the power spectrum. 

A further point to note is that the method deals au- 
tomatically with sky coverage which is not complete, and 
arbitrary noise correlations. These are the two crucial tests 
of any method, since they will be a feature of future high- 
quality experiments such as Planck and MAP. Three-point 
function estimators have traditionally not computed the co- 
variances directly, but rely on simulation tests to decide sig- 
nificance (e.g. Kogut et al. 1996). This is the disadvantage 
of many methods (e.g. genus, extrema correlation functions) 
for which the evaluation of errors may be difficult analyti- 
cally. Against this one has, of course, to balance speed ad- 
vantages. 

A preliminary application to the COBE 4-year data 
shows large-scale bispectrum coefficients consistent with 
zero. However, the errors are sufficiently large that these 
coefficients do not rule out non-gaussian models with confi- 
dence. 
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